home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-04-20 | 40.9 KB | 1,189 lines |
-
-
-
- GAWK User Commands GAWK
-
-
-
- NNAAMMEE
- gawk - pattern scanning and processing language
-
- SSYYNNOOPPSSIISS
- ggaawwkk [ --aa ] [ --ee ] [ --cc ] [ --CC ] [ --VV ] [ --FF_f_s ] [ --vv
- _v_a_r=_v_a_l ] --ff _p_r_o_g_r_a_m-_f_i_l_e [ ---- ] file ...
- ggaawwkk [ --aa ] [ --ee ] [ --cc ] [ --CC ] [ --VV ] [ --FF_f_s ] [ --vv
- _v_a_r=_v_a_l ] [ ---- ] _p_r_o_g_r_a_m-_t_e_x_t file ...
-
- DDEESSCCRRIIPPTTIIOONN
- _G_a_w_k is the GNU Project's implementation of the AWK program-
- ming language. It conforms to the definition and descrip-
- tion of the language in _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, by
- Aho, Kernighan, and Weinberger, with the additional features
- defined in the System V Release 4 version of UNIX _a_w_k, and
- some GNU-specific extensions.
-
- The command line consists of options to _g_a_w_k itself, the AWK
- program text (if not supplied via the --ff option), and values
- to be made available in the AARRGGCC and AARRGGVV pre-defined AWK
- variables.
-
- _G_a_w_k accepts the following options, which should be avail-
- able on any implementation of the AWK language.
-
- --FF_f_s Use _f_s for the input field separator (the value of the
- FFSS predefined variable).
-
- --vv _v_a_r=_v_a_l
- Assign the value _v_a_l, to the variable _v_a_r, before exe-
- cution of the program begins. Such variable values are
- available to the BBEEGGIINN block of an AWK program.
-
- --ff _p_r_o_g_r_a_m-_f_i_l_e
- Read the AWK program source from the file _p_r_o_g_r_a_m-_f_i_l_e,
- instead of from the first command line argument. Mul-
- tiple --ff options may be used.
-
- ---- Signal the end of options. This is useful to allow
- further arguments to the AWK program itself to start
- with a ``-''. This is mainly for consistency with the
- argument parsing convention used by most other System V
- programs.
-
- The following options are specific to the GNU implementa-
- tion.
-
- --aa Use AWK style regular expressions as described in the
- book. This is the current default, but may not be when
- the POSIX P1003.2 standard is finalized. It is orthog-
- onal to --cc.
-
-
-
-
- Free Software Foundation August 24 1989 1
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- --ee Use _e_g_r_e_p(1) style regular expressions as described in
- POSIX standard. This may become the default when the
- POSIX P1003.2 standard is finalized. It is orthogonal
- to --cc.
-
- --cc Run in _c_o_m_p_a_t_i_b_i_l_i_t_y mode. In compatibility mode, _g_a_w_k
- behaves identically to UNIX _a_w_k; none of the GNU-
- specific extensions are recognized.
-
- --CC Print the short version of the GNU copyright informa-
- tion message on the error output. This option may
- disappear in a future version of _g_a_w_k.
-
- --VV Print version information for this particular copy of
- _g_a_w_k on the error output. This is useful mainly for
- knowing if the current copy of _g_a_w_k on your system is
- up to date with respect to whatever the Free Software
- Foundation is distributing. This option may disappear
- in a future version of _g_a_w_k.
-
- Any other options are flagged as illegal, but are otherwise
- ignored.
-
- An AWK program consists of a sequence of pattern-action
- statements and optional function definitions.
-
- _p_a_t_t_e_r_n {{ _a_c_t_i_o_n _s_t_a_t_e_m_e_n_t_s }}
- ffuunnccttiioonn _n_a_m_e((_p_a_r_a_m_e_t_e_r _l_i_s_t)) {{ _s_t_a_t_e_m_e_n_t_s }}
-
- _G_a_w_k first reads the program source from the _p_r_o_g_r_a_m-_f_i_l_e(s)
- if specified, or from the first non-option argument on the
- command line. The --ff option may be used multiple times on
- the command line. _G_a_w_k will read the program text as if all
- the _p_r_o_g_r_a_m-_f_i_l_es had been concatenated together. This is
- useful for building libraries of AWK functions, without hav-
- ing to include them in each new AWK program that uses them.
- To use a library function in a file from a program typed in
- on the command line, specify //ddeevv//ttttyy as one of the
- _p_r_o_g_r_a_m-_f_i_l_es, type your program, and end it with a ^^DD
- (control-d).
-
- The environment variable AAWWKKPPAATTHH specifies a search path to
- use when finding source files named with the --ff option. If
- this variable does not exist, the default path is
- ""..:://uussrr//lliibb//aawwkk:://uussrr//llooccaall//lliibb//aawwkk"". If a file name given
- to the --ff option contains a ``/'' character, no path search
- is performed.
-
- _G_a_w_k compiles the program into an internal form, executes
- the code in the BBEEGGIINN block(s) (if any), and then proceeds
- to read each file named in the AARRGGVV array. If there are no
- files named on the command line, _g_a_w_k reads the standard
-
-
-
- Free Software Foundation August 24 1989 2
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- input.
-
- If a ``file'' named on the command line has the form _v_a_r==_v_a_l
- it is treated as a variable assignment. The variable _v_a_r
- will be assigned the value _v_a_l. This is most useful for
- dynamically assigning values to the variables AWK uses to
- control how input is broken into fields and records. It is
- also useful for controlling state if multiple passes are
- needed over a single data file.
-
- For each line in the input, _g_a_w_k tests to see if it matches
- any _p_a_t_t_e_r_n in the AWK program. For each pattern that the
- line matches, the associated _a_c_t_i_o_n is executed.
-
- VVAARRIIAABBLLEESS AANNDD FFIIEELLDDSS
- AWK variables are dynamic; they come into existence when
- they are first used. Their values are either floating-point
- numbers or strings, depending upon how they are used. AWK
- also has one dimension arrays; multiply dimensioned arrays
- may be simulated. There are several pre-defined variables
- that AWK sets as a program runs; these will be described as
- needed and summarized below.
-
- FFiieellddss
-
- As each input line is read, _g_a_w_k splits the line into
- _f_i_e_l_d_s, using the value of the FFSS variable as the field
- separator. If FFSS is a single character, fields are
- separated by that character. Otherwise, FFSS is expected to
- be a full regular expression. In the special case that FFSS
- is a single blank, fields are separated by runs of blanks
- and/or tabs. Note that the value of IIGGNNOORREECCAASSEE (see below)
- will also affect how fields are split when FFSS is a regular
- expression.
-
- Each field in the input line may be referenced by its posi-
- tion, $$11, $$22, and so on. $$00 is the whole line. The value of
- a field may be assigned to as well. Fields need not be
- referenced by constants:
-
- nn == 55
- pprriinntt $$nn
-
- prints the fifth field in the input line. The variable NNFF
- is set to the total number of fields in the input line.
-
- References to non-existent fields (i.e. fields after $$NNFF),
- produce the null-string. However, assigning to a non-
- existent field (e.g., $$((NNFF++22)) == 55) will increase the value
- of NNFF, create any intervening fields with the null string as
- their value, and cause the value of $$00 to be recomputed,
-
-
-
-
- Free Software Foundation August 24 1989 3
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- with the fields being separated by the value of OOFFSS.
-
- BBuuiilltt--iinn VVaarriiaabblleess
-
- AWK's built-in variables are:
-
- AARRGGCC the number of command line arguments (does not
- include options to _g_a_w_k, or the program source).
-
- AARRGGVV array of command line arguments. The array is
- indexed from 0 to AARRGGCC - 1. Dynamically changing
- the contents of AARRGGVV can control the files used
- for data.
-
- EENNVVIIRROONN
- An array containing the values of the current
- environment. The array is indexed by the environ-
- ment variables, each element being the value of
- that variable (e.g., EENNVVIIRROONN[[""HHOOMMEE""]] might be
- //uu//aarrnnoolldd). Changing this array does not affect
- the environment seen by programs which _g_a_w_k spawns
- via redirection or the ssyysstteemm function. (This may
- change in a future version of _g_a_w_k.)
-
- FFIILLEENNAAMMEE
- the name of the current input file. If no files
- are specified on the command line, the value of
- FFIILLEENNAAMMEE is ``-''.
-
- FFNNRR the input record number in the current input file.
-
- FFSS the input field separator, a blank by default.
-
- IIGGNNOORREECCAASSEE
- Controls the case-sensitivity of all regular
- expression operations. If IIGGNNOORREECCAASSEE has a non-
- zero value, then pattern matching in rules, field
- splitting with FFSS, regular expression matching
- with ~~ and !!~~, and the ggssuubb(()), iinnddeexx(()), mmaattcchh(()),
- sspplliitt(()), and ssuubb(()) pre-defined functions will all
- ignore case when doing regular expression opera-
- tions. Thus, if IIGGNNOORREECCAASSEE is not equal to zero,
- //aaBB// matches all of the strings ""aabb"", ""aaBB"", ""AAbb"",
- and ""AABB"". As with all AWK variables, the initial
- value of IIGGNNOORREECCAASSEE is zero, so all regular
- expression operations are normally case-sensitive.
-
- NNFF the number of fields in the current input record.
-
- NNRR the total number of input records seen so far.
-
- OOFFMMTT the output format for numbers, %%..66gg by default.
-
-
-
- Free Software Foundation August 24 1989 4
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- OOFFSS the output field separator, a blank by default.
-
- OORRSS the output record separator, by default a newline.
-
- RRSS the input record separator, by default a newline.
- RRSS is exceptional in that only the first character
- of its string value is used for separating
- records. If RRSS is set to the null string, then
- records are separated by blank lines. When RRSS is
- set to the null string, then the newline character
- always acts as a field separator, in addition to
- whatever value FFSS may have.
-
- RRSSTTAARRTT
- the index of the first character matched by
- mmaattcchh(()); 0 if no match.
-
- RRLLEENNGGTTHH
- the length of the string matched by mmaattcchh(()); -1 if
- no match.
-
- SSUUBBSSEEPP
- the character used to separate multiple subscripts
- in array elements, by default ""\\003344"".
-
- AArrrraayyss
-
- Arrays are subscripted with an expression between square
- brackets ([[ and ]]). If the expression is an expression list
- (_e_x_p_r, _e_x_p_r ...) then the array subscript is a string con-
- sisting of the concatenation of the (string) value of each
- expression, separated by the value of the SSUUBBSSEEPP variable.
- This facility is used to simulate multiply dimensioned
- arrays. For example:
-
- ii == ""AA"" ;; jj == ""BB"" ;; kk == ""CC""
- xx[[ii,, jj,, kk]] == ""hheelllloo,, wwoorrlldd\\nn""
-
- assigns the string ""hheelllloo,, wwoorrlldd\\nn"" to the element of the
- array xx which is indexed by the string ""AA\\003344BB\\003344CC"". All
- arrays in AWK are associative, i.e. indexed by string
- values.
-
- The special operator iinn may be used in an iiff or wwhhiillee state-
- ment to see if an array has an index consisting of a partic-
- ular value.
-
- iiff ((vvaall iinn aarrrraayy))
- pprriinntt aarrrraayy[[vvaall]]
-
- If the array has multiple subscripts, use ((ii,, jj)) iinn aarrrraayy.
-
-
-
-
- Free Software Foundation August 24 1989 5
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- The iinn construct may also be used in a ffoorr loop to iterate
- over all the elements of an array.
-
- An element may be deleted from an array using the ddeelleettee
- statement.
-
- VVaarriiaabbllee TTyyppiinngg
-
- Variables and fields may be (floating point) numbers, or
- strings, or both. How the value of a variable is interpreted
- depends upon its context. If used in a numeric expression,
- it will be treated as a number, if used as a string it will
- be treated as a string.
-
- To force a variable to be treated as a number, add 0 to it;
- to force it to be treated as a string, concatenate it with
- the null string.
-
- The AWK language defines comparisons as being done numeri-
- cally if possible, otherwise one or both operands are con-
- verted to strings and a string comparison is performed.
-
- Uninitialized variables have the numeric value 0 and the
- string value "" (the null, or empty, string).
-
- PPAATTTTEERRNNSS AANNDD AACCTTIIOONNSS
- AWK is a line oriented language. The pattern comes first,
- and then the action. Action statements are enclosed in {{ and
- }}. Either the pattern may be missing, or the action may be
- missing, but, of course, not both. If the pattern is miss-
- ing, the action will be executed for every single line of
- input. A missing action is equivalent to
-
- {{ pprriinntt }}
-
- which prints the entire line.
-
- Comments begin with the ``#'' character, and continue until
- the end of the line. Blank lines may be used to separate
- statements. Normally, a statement ends with a newline, how-
- ever, this is not the case for lines ending in a ``,'',
- ``{'', ``?'', ``:'', ``&&'', or ``||''. Lines ending in ddoo
- or eellssee also have their statements automatically continued
- on the following line. In other cases, a line can be con-
- tinued by ending it with a ``\'', in which case the newline
- will be ignored.
-
- Multiple statements may be put on one line by separating
- them with a ``;''. This applies to both the statements
- within the action part of a pattern-action pair (the usual
-
-
-
-
-
- Free Software Foundation August 24 1989 6
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- case), and to the pattern-action statements themselves.
-
- PPaatttteerrnnss
- AWK patterns may be one of the following:
-
- BBEEGGIINN
- EENNDD
- //_r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n//
- _r_e_l_a_t_i_o_n_a_l _e_x_p_r_e_s_s_i_o_n
- _p_a_t_t_e_r_n &&&& _p_a_t_t_e_r_n
- _p_a_t_t_e_r_n |||| _p_a_t_t_e_r_n
- _p_a_t_t_e_r_n ?? _p_a_t_t_e_r_n :: _p_a_t_t_e_r_n
- ((_p_a_t_t_e_r_n))
- !! _p_a_t_t_e_r_n
- _p_a_t_t_e_r_n_1,, _p_a_t_t_e_r_n_2
-
- BBEEGGIINN and EENNDD are two special kinds of patterns which are
- not tested against the input. The action parts of all BBEEGGIINN
- patterns are merged as if all the statements had been writ-
- ten in a single BBEEGGIINN block. They are executed before any of
- the input is read. Similarly, all the EENNDD blocks are merged,
- and executed when all the input is exhausted (or when an
- eexxiitt statement is executed). BBEEGGIINN and EENNDD patterns cannot
- be combined with other patterns in pattern expressions.
- BBEEGGIINN and EENNDD patterns cannot have missing action parts.
-
- For //_r_e_g_u_l_a_r _e_x_p_r_e_s_s_i_o_n// patterns, the associated statement
- is executed for each input line that matches the regular
- expression. Regular expressions are the same as those in
- _e_g_r_e_p(1), and are summarized below.
-
- A _r_e_l_a_t_i_o_n_a_l _e_x_p_r_e_s_s_i_o_n may use any of the operators defined
- below in the section on actions. These generally test
- whether certain fields match certain regular expressions.
-
- The &&&&, ||||, and !! operators are logical AND, logical OR, and
- logical NOT, respectively, as in C. They do short-circuit
- evaluation, also as in C, and are used for combining more
- primitive pattern expressions. As in most languages,
- parentheses may be used to change the order of evaluation.
-
- The ??:: operator is like the same operator in C. If the first
- pattern is true then the pattern used for testing is the
- second pattern, otherwise it is the third. Only one of the
- second and third patterns is evaluated.
-
- The _p_a_t_t_e_r_n_1,, _p_a_t_t_e_r_n_2 form of an expression is called a
- range pattern. It matches all input lines starting with a
- line that matches _p_a_t_t_e_r_n_1, and continuing until a line that
- matches _p_a_t_t_e_r_n_2, inclusive. It does not combine with any
-
-
-
-
-
- Free Software Foundation August 24 1989 7
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- other sort of pattern expression.
-
- RReegguullaarr EExxpprreessssiioonnss
- Regular expressions are the extended kind found in _e_g_r_e_p.
- They are composed of characters as follows:
-
- _c matches the non-metacharacter _c.
-
- _\_c matches the literal character _c.
-
- .. matches any character except newline.
-
- ^^ matches the beginning of a line or a string.
-
- $$ matches the end of a line or a string.
-
- [[_a_b_c...]]
- character class, matches any of the characters
- _a_b_c....
-
- [[^^_a_b_c...]]
- negated character class, matches any character
- except _a_b_c... and newline.
-
- _r_1||_r_2
- alternation: matches either _r_1 or _r_2.
-
- _r_1_r_2 concatenation: matches _r_1, and then _r_2.
-
- _r++ matches one or more _r's.
-
- _r** matches zero or more _r's.
-
- _r?? matches zero or one _r's.
-
- ((_r)) grouping: matches _r.
- The escape sequences that are valid in string constants (see
- below) are also legal in regular expressions.
-
- AAccttiioonnss
- Action statements are enclosed in braces, {{ and }}. Action
- statements consist of the usual assignment, conditional, and
- looping statements found in most languages. The operators,
- control statements, and input/output statements available
- are patterned after those in C.
-
- OOppeerraattoorrss
-
- The operators in AWK, in order of increasing precedence, are
-
- == ++== --== **== //== %%== ^^==
- Assignment. Both absolute assignment ((_v_a_r == _v_a_l_u_e))
-
-
-
- Free Software Foundation August 24 1989 8
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- and operator-assignment (the other forms) are sup-
- ported.
-
- ??:: The C conditional expression. This has the form
- _e_x_p_r_1 ?? _e_x_p_r_2 :: _e_x_p_r_3. If _e_x_p_r_1 is true, the value
- of the expression is _e_x_p_r_2, otherwise it is _e_x_p_r_3.
- Only one of _e_x_p_r_2 and _e_x_p_r_3 is evaluated.
-
- |||| logical OR.
-
- &&&& logical AND.
-
- ~~ !!~~ regular expression match, negated match.
-
- << <<== >> >>== !!== ====
- the regular relational operators.
-
- _b_l_a_n_k
- string concatenation.
-
- ++ -- addition and subtraction.
-
- ** // %%
- multiplication, division, and modulus.
-
- ++ -- !!
- unary plus, unary minus, and logical negation.
-
- ^^ exponentiation (**** may also be used, and ****== for
- the assignment operator).
-
- ++++ ----
- increment and decrement, both prefix and postfix.
-
- $$ field reference.
-
- CCoonnttrrooll SSttaatteemmeennttss
-
- The control statements are as follows:
-
- iiff ((_c_o_n_d_i_t_i_o_n)) _s_t_a_t_e_m_e_n_t [ eellssee _s_t_a_t_e_m_e_n_t ]
- wwhhiillee ((_c_o_n_d_i_t_i_o_n)) _s_t_a_t_e_m_e_n_t
- ddoo _s_t_a_t_e_m_e_n_t wwhhiillee ((_c_o_n_d_i_t_i_o_n))
- ffoorr ((_e_x_p_r_1;; _e_x_p_r_2;; _e_x_p_r_3)) _s_t_a_t_e_m_e_n_t
- ffoorr ((_v_a_r iinn _a_r_r_a_y)) _s_t_a_t_e_m_e_n_t
- bbrreeaakk
- ccoonnttiinnuuee
- ddeelleettee _a_r_r_a_y[[_i_n_d_e_x]]
- eexxiitt [ _e_x_p_r_e_s_s_i_o_n ]
- {{ _s_t_a_t_e_m_e_n_t_s }}
-
-
-
-
-
- Free Software Foundation August 24 1989 9
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- II//OO SSttaatteemmeennttss
-
- The input/output statements are as follows:
-
- cclloossee((_f_i_l_e_n_a_m_e))
- close file (or pipe, see below).
-
- ggeettlliinnee
- set $$00 from next input record; set NNFF, NNRR, FFNNRR.
-
- ggeettlliinnee <<_f_i_l_e
- set $$00 from next record of _f_i_l_e; set NNFF.
-
- ggeettlliinnee _v_a_r
- set _v_a_r from next input record; set NNFF, FFNNRR.
-
- ggeettlliinnee _v_a_r <<_f_i_l_e
- set _v_a_r from next record of _f_i_l_e.
-
- nneexxtt Stop processing the current input record. The next
- input record is read and processing starts over
- with the first pattern in the AWK program. If the
- end of the input data is reached, the EENNDD
- block(s), if any, are executed.
-
- pprriinntt
- prints the current record.
-
- pprriinntt _e_x_p_r-_l_i_s_t
- prints expressions.
-
- pprriinntt _e_x_p_r-_l_i_s_t >>_f_i_l_e
- prints expressions on _f_i_l_e.
-
- pprriinnttff _f_m_t, _e_x_p_r-_l_i_s_t
- format and print.
-
- pprriinnttff _f_m_t, _e_x_p_r-_l_i_s_t >>_f_i_l_e
- format and print on _f_i_l_e.
-
- ssyysstteemm((_c_m_d-_l_i_n_e))
- execute the command _c_m_d-_l_i_n_e, and return the exit
- status. (This may not be available on systems
- besides UNIX and GNU.)
-
- Other input/output redirections are also allowed. For pprriinntt
- and pprriinnttff, >>>>_f_i_l_e appends output to the _f_i_l_e, while || _c_o_m_-
- _m_a_n_d writes on a pipe. In a similar fashion, _c_o_m_m_a_n_d || ggeett--
- lliinnee pipes into ggeettlliinnee. GGeettlliinnee will return 0 on end of
-
-
-
-
-
-
- Free Software Foundation August 24 1989 10
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- file, and -1 on an error.
-
- TThhee _p_r_i_n_t_f SSttaatteemmeenntt
-
- The AWK versions of the pprriinnttff and sspprriinnttff (see below) func-
- tions accept the following conversion specification formats:
-
- %%cc An ASCII character. If the argument used for %%cc
- is numeric, it is treated as a character and
- printed. Otherwise, the argument is assumed to be
- a string, and the only first character of that
- string is printed.
-
- %%dd A decimal number (the integer part).
-
- %%ii Just like %%dd.
-
- %%ee A floating point number of the form
- [[--]]dd..ddddddddddddEE[[++--]]dddd.
-
- %%ff A floating point number of the form [[--]]dddddd..dddddddddddd.
-
- %%gg Use ee or ff conversion, whichever is shorter, with
- nonsignificant zeros suppressed.
-
- %%oo An unsigned octal number (again, an integer).
-
- %%ss A character string.
-
- %%xx An unsigned hexadecimal number (an integer).
-
- %%XX Like %%xx, but using AABBCCDDEEFF instead of aabbccddeeff.
-
- %%%% A single %% character; no argument is converted.
-
- There are optional, additional parameters that may lie
- between the %% and the control letter:
-
- -- The expression should be left-justified within its
- field.
-
- _w_i_d_t_h
- The field should be padded to this width. If the
- number has a leading zero, then the field will be
- padded with zeros. Otherwise it is padded with
- blanks.
-
- .._p_r_e_c
- A number indicating the maximum width of strings
- or digits to the right of the decimal point.
-
- The dynamic _w_i_d_t_h and _p_r_e_c capabilities of the C library
-
-
-
- Free Software Foundation August 24 1989 11
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- pprriinnttff routines are not supported. However, they may be
- simulated by using the AWK concatenation operation to build
- up a format specification dynamically.
-
- SSppeecciiaall FFiillee NNaammeess
-
- When doing I/O redirection from either pprriinntt or pprriinnttff into
- a file, or via ggeettlliinnee from a file, _g_a_w_k recognizes certain
- special filenames internally. These filenames allow access
- to open file descriptors inherited from _g_a_w_k's parent pro-
- cess (usually the shell). The filenames are:
-
- //ddeevv//ssttddiinn
- The standard input.
-
- //ddeevv//ssttddoouutt
- The standard output.
-
- //ddeevv//ssttddeerrrr
- The standard error output.
-
- //ddeevv//ffdd//_n
- The file denoted by the open file descriptor _n.
-
- These are particularly useful for error messages. For exam-
- ple:
-
- pprriinntt ""YYoouu bblleeww iitt!!"" >> ""//ddeevv//ssttddeerrrr""
-
- whereas you would otherwise have to use
-
- pprriinntt ""YYoouu bblleeww iitt!!"" || ""ccaatt 11>>&&22""
-
- These file names may also be used on the command line to
- name data files.
-
- NNuummeerriicc FFuunnccttiioonnss
-
- AWK has the following pre-defined arithmetic functions:
-
- aattaann22((_y,, _x))
- returns the arctangent of _y/_x in radians.
-
- ccooss((_e_x_p_r))
- returns the cosine in radians.
-
- eexxpp((_e_x_p_r))
- the exponential function.
-
- iinntt((_e_x_p_r))
- truncates to integer.
-
-
-
-
- Free Software Foundation August 24 1989 12
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- lloogg((_e_x_p_r))
- the natural logarithm function.
-
- rraanndd(())
- returns a random number between 0 and 1.
-
- ssiinn((_e_x_p_r))
- returns the sine in radians.
-
- ssqqrrtt((_e_x_p_r))
- the square root function.
-
- ssrraanndd((_e_x_p_r))
- use _e_x_p_r as a new seed for the random number gen-
- erator. If no _e_x_p_r is provided, the time of day
- will be used. The return value is the previous
- seed for the random number generator.
-
- SSttrriinngg FFuunnccttiioonnss
-
- AWK has the following pre-defined string functions:
-
- ggssuubb((_r,, _s,, _t))
- for each substring matching the regular expression
- _r in the string _t, substitute the string _s, and
- return the number of substitutions. If _t is not
- supplied, use $$00.
-
- iinnddeexx((_s,, _t))
- returns the index of the string _t in the string _s,
- or 0 if _t is not present.
-
- lleennggtthh((_s))
- returns the length of the string _s.
-
- mmaattcchh((_s,, _r))
- returns the position in _s where the regular
- expression _r occurs, or 0 if _r is not present, and
- sets the values of RRSSTTAARRTT and RRLLEENNGGTTHH.
-
- sspplliitt((_s,, _a,, _r))
- splits the string _s into the array _a on the regu-
- lar expression _r, and returns the number of
- fields. If _r is omitted, FFSS is used instead.
-
- sspprriinnttff((_f_m_t,, _e_x_p_r-_l_i_s_t))
- prints _e_x_p_r-_l_i_s_t according to _f_m_t, and returns the
- resulting string.
-
- ssuubb((_r,, _s,, _t))
- this is just like ggssuubb, but only the first match-
- ing substring is replaced.
-
-
-
- Free Software Foundation August 24 1989 13
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- ssuubbssttrr((_s,, _i,, _n))
- returns the _n-character substring of _s starting at
- _i. If _n is omitted, the rest of _s is used.
-
- ttoolloowweerr((_s_t_r))
- returns a copy of the string _s_t_r, with all the
- upper-case characters in _s_t_r translated to their
- corresponding lower-case counterparts. Non-
- alphabetic characters are left unchanged.
-
- ttoouuppppeerr((_s_t_r))
- returns a copy of the string _s_t_r, with all the
- lower-case characters in _s_t_r translated to their
- corresponding upper-case counterparts. Non-
- alphabetic characters are left unchanged.
-
- SSttrriinngg CCoonnssttaannttss
-
- String constants in AWK are sequences of characters enclosed
- between double quotes (""). Within strings, certain _e_s_c_a_p_e
- _s_e_q_u_e_n_c_e_s are recognized, as in C. These are:
-
- \\\\ A literal backslash.
-
- \\aa The ``alert'' character; usually the ASCII BEL
- character.
-
- \\bb backspace.
-
- \\ff form-feed.
-
- \\nn new line.
-
- \\rr carriage return.
-
- \\tt horizontal tab.
-
- \\vv vertical tab.
-
- \\xx_h_e_x _d_i_g_i_t_s
- The character represented by the string of hexade-
- cimal digits following the \\xx. As in ANSI C, all
- following hexadecimal digits are considered part
- of the escape sequence. (This feature should tell
- us something about language design by committee.)
- E.g., "\x1B" is the ASCII ESC (escape) character.
-
- \\_d_d_d The character represented by the 1-, 2-, or 3-
- digit sequence of octal digits. E.g. "\033" is the
- ASCII ESC (escape) character.
-
- \\_c The literal character _c.
-
-
-
- Free Software Foundation August 24 1989 14
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- The escape sequences may also be used inside constant regu-
- lar expressions (e.g., //[[ \\tt\\ff\\nn\\rr\\vv]]// matches whitespace
- characters).
-
- FFUUNNCCTTIIOONNSS
- Functions in AWK are defined as follows:
-
- ffuunnccttiioonn _n_a_m_e((_p_a_r_a_m_e_t_e_r _l_i_s_t)) {{ _s_t_a_t_e_m_e_n_t_s }}
-
- Functions are executed when called from within the action
- parts of regular pattern-action statements. Actual parame-
- ters supplied in the function call are used to instantiate
- the formal parameters declared in the function. Arrays are
- passed by reference, other variables are passed by value.
-
- Since functions were not originally part of the AWK
- language, the provision for local variables is rather
- clumsy: they are declared as extra parameters in the parame-
- ter list. The convention is to separate local variables from
- real parameters by extra spaces in the parameter list. For
- example:
-
- ffuunnccttiioonn ff((pp,, qq,, aa,, bb)) {{ ## aa && bb aarree llooccaall
- .......... }}
-
- //aabbcc// {{ ...... ;; ff((11,, 22)) ;; ...... }}
-
- The left parenthesis in a function call is required to
- immediately follow the function name, without any interven-
- ing white space. This is to avoid a syntactic ambiguity
- with the concatenation operator. This restriction does not
- apply to the built-in functions listed above.
-
- Functions may call each other and may be recursive. Func-
- tion parameters used as local variables are initialized to
- the null string and the number zero upon function invoca-
- tion.
-
- The word ffuunncc may be used in place of ffuunnccttiioonn.
-
- EEXXAAMMPPLLEESS
- Print and sort the login names of all users:
-
- BBEEGGIINN {{ FFSS == ""::"" }}
- {{ pprriinntt $$11 || ""ssoorrtt"" }}
-
- Count lines in a file:
-
- {{ nnlliinneess++++ }}
- EENNDD {{ pprriinntt nnlliinneess }}
-
- Precede each line by its number in the file:
-
-
-
- Free Software Foundation August 24 1989 15
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- {{ pprriinntt FFNNRR,, $$00 }}
-
- Concatenate and line number (a variation on a theme):
-
- {{ pprriinntt NNRR,, $$00 }}
-
- SSEEEE AALLSSOO
- _e_g_r_e_p(1)
-
- _T_h_e _A_W_K _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e, Alfred V. Aho, Brian W. Ker-
- nighan, Peter J. Weinberger, Addison-Wesley, 1988. ISBN 0-
- 201-07981-X.
-
- _T_h_e _G_A_W_K _M_a_n_u_a_l, published by the Free Software Foundation,
- 1989.
-
- SSYYSSTTEEMM VV RREELLEEAASSEE 44 CCOOMMPPAATTIIBBIILLIITTYY
- A primary goal for _g_a_w_k is compatibility with the latest
- version of UNIX _a_w_k. To this end, _g_a_w_k incorporates the
- following user visible features which are not described in
- the AWK book, but are part of _a_w_k in System V Release 4.
-
- The --vv option for assigning variables before program execu-
- tion starts is new. The book indicates that command line
- variable assignment happens when _a_w_k would otherwise open
- the argument as a file, which is after the BBEEGGIINN block is
- executed. However, in earlier implementations, when such an
- assignment appeared before any file names, the assignment
- would happen _b_e_f_o_r_e the BBEEGGIINN block was run. Applications
- came to depend on this ``feature.'' When _a_w_k was changed to
- match its documentation, this option was added to accomodate
- applications that depended upon the old behaviour.
-
- When processing arguments, _g_a_w_k uses the special option
- ``----'' to signal the end of arguments, and warns about, but
- otherwise ignores, undefined options.
-
- The AWK book does not define the return value of ssrraanndd(()).
- The System V Release 4 version of UNIX _a_w_k has it return the
- seed it was using, to allow keeping track of random number
- sequences. Therefore ssrraanndd(()) in _g_a_w_k also returns its
- current seed.
-
- Other new features are: The use of multiple --ff options; the
- EENNVVIIRROONN array; the \\aa, and \\vv, \\xx escape sequences; the
- ttoolloowweerr and ttoouuppppeerr built-in functions; and the ANSI C
- conversion specifications in pprriinnttff.
-
- GGNNUU EEXXTTEENNSSIIOONNSS
- _G_a_w_k has some extensions to System V _a_w_k. They are
- described in this section. All the extensions described
- here can be disabled by compiling _g_a_w_k with --DDSSTTRRIICCTT, or by
-
-
-
- Free Software Foundation August 24 1989 16
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- invoking _g_a_w_k with the --cc option. If the underlying operat-
- ing system supports the //ddeevv//ffdd directory and corresponding
- files, then _g_a_w_k can be compiled with --DDNNOO__DDEEVV__FFDD to disable
- the special filename processing.
-
- The following features of _g_a_w_k are not available in System V
- _a_w_k.
-
- o+ The special file names available for I/O redirec-
- tion are not recognized.
-
- o+ The IIGGNNOORREECCAASSEE variable and its side-effects are
- not available.
-
- o+ No path search is performed for files named via
- the --ff option. Therefore the AAWWKKPPAATTHH environment
- variable is not special.
-
- o+ The --aa, --ee, --cc, --CC, and --VV command line options.
-
- The AWK book does not define the return value of the cclloossee
- function. _G_a_w_k's cclloossee returns the value from _f_c_l_o_s_e(3), or
- _p_c_l_o_s_e(3), when closing a file or pipe, respectively.
-
- When _g_a_w_k is invoked with the --cc option, if the _f_s argument
- to the --FF option is ``t'', then FFSS will be set to the tab
- character. Since this is a rather ugly special case, it is
- not the default behavior.
-
- BBUUGGSS
- The --FF option is not necessary given the command line vari-
- able assignment feature; it remains only for backwards com-
- patibility.
-
- There are now too many options. Fortunately, most of them
- are rarely needed.
-
- AAUUTTHHOORRSS
- The original version of UNIX _a_w_k was designed and imple-
- mented by Alfred Aho, Peter Weinberger, and Brian Kernighan
- of AT&T Bell Labs. Brian Kernighan continues to maintain and
- enhance it.
-
- Paul Rubin and Jay Fenlason, of the Free Software Founda-
- tion, wrote _g_a_w_k, to be compatible with the original version
- of _a_w_k distributed in Seventh Edition UNIX. John Woods con-
- tributed a number of bug fixes. David Trueman of Dalhousie
- University, with contributions from Arnold Robbins at Emory
- University, made _g_a_w_k compatible with the new version of
- UNIX _a_w_k.
-
-
-
-
-
- Free Software Foundation August 24 1989 17
-
-
-
-
-
-
- GAWK User Commands GAWK
-
-
-
- AACCKKNNOOWWLLEEDDGGEEMMEENNTTSS
- Brian Kernighan of Bell Labs provided valuable assistance
- during testing and debugging. We thank him.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Free Software Foundation August 24 1989 18
-
-
-
-